This document comprises the first attempt to fit a multivariate functional linear regression (mFLR) model, including performing FPCAs for functional variables The available covariates can be classified into four different types:
The first category is not modeled in this approach since there is need for further thinking. The PFT-dependent variables are transformed to PFT-wise covariates to match the general data shape (1803 observations, i.e., disturbed grid cells). For every disturbed grid cell, the soil related variables are already provided, so no further pre-processing is necessary.
The time-varying, location dependent variables, i.e., the climate covariates and total nitrogen uptake are functional as the response. This implies that these covariates need to be transformed into multivariate data. In particular, in this first approach, the two climate variables \(tas\_yearlymean\) and \(pr\_yearlysum\) are represented as PC scores in the final model formulation, which result from two univariate FPCAs. Details on that are derived in the following.
The first step in functional data analysis is to find an appropriate representation for the functional data. Figure 1 shows the functional fit for the average annual temperature (\(tas\_yearlymean\)). Note that there is no basis representation or smoother involved here. The black line indicates the mean curve. According to the curves, there is a slight trend towards warmer temperatures in the later decades of the study period. But this result is less meaningful considering the curves comprise all four scenarios.
Figure 2 shows the functional fit for the annual average temperature for each scenario. As expected, with rising radial forcing, the annual mean temperature increases.
Figure 3 visualizes the annual sum of precipitation as function for each disturbed grid cell. The overall mean function indicates no trend in time across the study period.
As before, Figure 4 shows the scenario-wise functional fits of precipitation. Interestingly, one may hypothesize, that the more radial forcing, the less variation in precipitation.
In general, all plots indicated that most of the variation in the data is captured by horizontal lines. This is proved by considering the principal components in the next section.
Performing a FPCA or MFPCA using the R package includes steps within the functions which are not obvious to the user. For instance, the basis for the basis representation is set automatically, and the functional fits cannot be further explored. There is need for diving deeper into the functionality of this package to fully understand the basis representation used for running the FPCA or MFPCA.
Figure 5 and 6 show the first two principal components for both annual mean temperature and annual precipitation, respectively. As already indicated in the figures above, nearly all the variance can be captured by \(y=0\).
Note that nearly all the variance in the data is already captured by the first PC for both variables.
Questions here: What do I have to look at when I want to see that PC 1 and PC 2 are orthogonal to each other?
The performed FPCAs serve as dimensionality reducer and smoothing operator. This becomes clear when looking at the reconstructed curves using the first 10 PCs visualized in Figure 7.
With these results, all provided covariates are prepared for modeling in the next step.
Combining all covariates together results in the following model formula:
\(\begin{pmatrix}\text{PC1} \\\text{PC2}\end{pmatrix}= \ \begin{pmatrix}\beta_{0,1} \\\beta_{0,2}\end{pmatrix}+\begin{pmatrix}\beta_{1,1} \\ \beta_{1,2}\end{pmatrix} \cdot \text{Lon}+\begin{pmatrix}\beta_{2,1} \\ \beta_{2,2}\end{pmatrix} \cdot \text{Lat}+\begin{pmatrix}\beta_{3,1} \\ \beta_{3,2}\end{pmatrix} \cdot \text{sand_fraction} \\ +\begin{pmatrix}\beta_{4,1} \\ \beta_{4,2}\end{pmatrix} \cdot\text{silt_fraction}+\begin{pmatrix}\beta_{5,1} \\ \beta_{5,2}\end{pmatrix} \cdot \text{bulkdensity_soil}+\begin{pmatrix}\beta_{6,1} \\ \beta_{6,2}\end{pmatrix} \cdot \text{ph_soil} \\ +\begin{pmatrix}\beta_{7,1} \\ \beta_{7,2}\end{pmatrix} \cdot \text{soilcarbon}+\begin{pmatrix}\beta_{8,1} \\ \beta_{8,2}\end{pmatrix} \cdot \text{time_since_dist}\\+\begin{pmatrix}\beta_{9,1} \\ \beta_{9,2}\end{pmatrix} \cdot\text{Scenario}_{\text{SSP1-RCP2.6}} + \begin{pmatrix}\beta_{10,1} \\ \beta_{10,2}\end{pmatrix} \cdot\text{Scenario}_{\text{SSP3-RCP7.0}}+ \begin{pmatrix}\beta_{11,1} \\ \beta_{11,2}\end{pmatrix} \cdot\text{Scenario}_{\text{SSP5-RCP8.5}} \\+\begin{pmatrix}\beta_{12,1} \\ \beta_{12,2}\end{pmatrix} \cdot \text{initial_recruitment_BNE}+\begin{pmatrix}\beta_{13,1} \\ \beta_{13,2}\end{pmatrix} \cdot\text{initial_recruitment_IBS}+\begin{pmatrix}\beta_{14,1} \\ \beta_{14,2}\end{pmatrix} \cdot\text{initial_recruitment_TeBS} \\ +\begin{pmatrix}\beta_{15,1} \\ \beta_{15,2}\end{pmatrix} \cdot\text{initial_recruitment_Tundra}+\begin{pmatrix}\beta_{16,1} \\ \beta_{16,2}\end{pmatrix} \cdot\text{initial_recruitment_otherC}\\+\begin{pmatrix}\beta_{17,1} \\ \beta_{17,2}\end{pmatrix} \cdot\text{recruitment_ten_years_BNE} +\begin{pmatrix}\beta_{18,1} \\ \beta_{18,2}\end{pmatrix} \cdot\text{recruitment_ten_years_IBS}+\begin{pmatrix}\beta_{19,1} \\ \beta_{19,2}\end{pmatrix} \cdot\text{recruitment_ten_years_TeBS}\\+\begin{pmatrix}\beta_{20,1} \\ \beta_{20,2}\end{pmatrix} \cdot\text{recruitment_ten_years_Tundra} +\begin{pmatrix}\beta_{21,1} \\ \beta_{21,2}\end{pmatrix} \cdot\text{recruitment_ten_years_otherC}\\+\begin{pmatrix}\beta_{22,1} \\ \beta_{22,2}\end{pmatrix} \cdot\text{previous_state_BNE}+\begin{pmatrix}\beta_{23,1} \\ \beta_{23,2}\end{pmatrix} \cdot \text{previous_state_IBS} +\begin{pmatrix}\beta_{24,1} \\ \beta_{24,2}\end{pmatrix} \cdot \text{previous_state_TeBS}\\+\begin{pmatrix}\beta_{25,1} \\ \beta_{25,2}\end{pmatrix} \cdot \text{previous_state_Tundra}+\begin{pmatrix}\beta_{26,1} \\ \beta_{26,2}\end{pmatrix} \cdot \text{previous_state_otherC}\\ +\begin{pmatrix}\beta_{27,1} \\ \beta_{27,2}\end{pmatrix} \cdot \text{PC1_temp}+\begin{pmatrix}\beta_{28,1} \\ \beta_{28,2}\end{pmatrix} \cdot \text{PC1_precip}+\begin{pmatrix}\epsilon_1\\\epsilon_2\end{pmatrix}\)
Since the coil composition adds up to 1, one of the three composition is left out (\(clay\_fraction\)). Note that so far only linear effects are considered. All variables but \(tas\_yearlymin\), \(tas\_yearlymax\) and both nitrogen uptake variables (\(Nuptake\) and \(Nuptake\_total\)) are considered here.
The model summary including the effect estimates are given here:
## Response PC1 :
##
## Call:
## lm(formula = PC1 ~ Lon + Lat + sand_fraction + silt_fraction +
## bulkdensity_soil + ph_soil + soilcarbon + time_since_dist +
## Scenario + initial_recruitment_BNE + initial_recruitment_IBS +
## initial_recruitment_TeBS + initial_recruitment_Tundra + initial_recruitment_otherC +
## recruitment_ten_years_BNE + recruitment_ten_years_IBS + recruitment_ten_years_TeBS +
## recruitment_ten_years_Tundra + recruitment_ten_years_otherC +
## previous_state_BNE + previous_state_IBS + previous_state_TeBS +
## previous_state_Tundra + previous_state_otherC + PC1_temp +
## PC1_precip, data = d_train %>% select(-clay_fraction))
##
## Residuals:
## Min 1Q Median 3Q Max
## -10.6860 -1.9649 -0.0065 1.9652 7.4668
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.199e+01 1.962e+00 -11.209 < 2e-16 ***
## Lon -6.785e-03 8.459e-04 -8.021 2.18e-15 ***
## Lat 2.605e-01 1.630e-02 15.980 < 2e-16 ***
## sand_fraction 2.192e+01 1.223e+00 17.913 < 2e-16 ***
## silt_fraction 1.410e+01 2.113e+00 6.671 3.64e-11 ***
## bulkdensity_soil -3.708e+00 7.292e-01 -5.085 4.16e-07 ***
## ph_soil -3.083e-01 1.710e-01 -1.803 0.07155 .
## soilcarbon -4.044e-01 6.431e-02 -6.287 4.29e-10 ***
## time_since_dist 4.367e-04 5.782e-04 0.755 0.45029
## ScenarioSSP1-RCP2.6 -1.794e+00 2.126e-01 -8.439 < 2e-16 ***
## ScenarioSSP3-RCP7.0 -1.308e+00 2.409e-01 -5.430 6.62e-08 ***
## ScenarioSSP5-RCP8.5 -1.343e+00 2.590e-01 -5.183 2.50e-07 ***
## initial_recruitment_BNE 1.083e-02 2.404e-03 4.506 7.14e-06 ***
## initial_recruitment_IBS 2.792e-03 4.060e-03 0.688 0.49175
## initial_recruitment_TeBS 4.895e-02 1.832e-02 2.672 0.00763 **
## initial_recruitment_Tundra -9.698e-03 3.652e-03 -2.655 0.00801 **
## initial_recruitment_otherC -9.606e-03 4.702e-03 -2.043 0.04123 *
## recruitment_ten_years_BNE -3.910e-04 1.730e-04 -2.260 0.02398 *
## recruitment_ten_years_IBS 1.259e-03 6.809e-04 1.850 0.06458 .
## recruitment_ten_years_TeBS -6.397e-03 3.263e-03 -1.960 0.05014 .
## recruitment_ten_years_Tundra 3.390e-04 2.358e-04 1.438 0.15074
## recruitment_ten_years_otherC 6.479e-04 1.414e-03 0.458 0.64684
## previous_state_BNE 4.751e-03 2.538e-02 0.187 0.85152
## previous_state_IBS -7.149e-03 1.818e-02 -0.393 0.69414
## previous_state_TeBS 1.656e-01 1.141e-01 1.451 0.14689
## previous_state_Tundra 5.425e-03 1.529e-01 0.035 0.97171
## previous_state_otherC -1.832e-01 3.789e-02 -4.834 1.48e-06 ***
## PC1_temp 6.787e-02 3.770e-03 18.002 < 2e-16 ***
## PC1_precip -6.060e-04 3.770e-05 -16.074 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.632 on 1414 degrees of freedom
## Multiple R-squared: 0.6317, Adjusted R-squared: 0.6244
## F-statistic: 86.61 on 28 and 1414 DF, p-value: < 2.2e-16
##
##
## Response PC2 :
##
## Call:
## lm(formula = PC2 ~ Lon + Lat + sand_fraction + silt_fraction +
## bulkdensity_soil + ph_soil + soilcarbon + time_since_dist +
## Scenario + initial_recruitment_BNE + initial_recruitment_IBS +
## initial_recruitment_TeBS + initial_recruitment_Tundra + initial_recruitment_otherC +
## recruitment_ten_years_BNE + recruitment_ten_years_IBS + recruitment_ten_years_TeBS +
## recruitment_ten_years_Tundra + recruitment_ten_years_otherC +
## previous_state_BNE + previous_state_IBS + previous_state_TeBS +
## previous_state_Tundra + previous_state_otherC + PC1_temp +
## PC1_precip, data = d_train %>% select(-clay_fraction))
##
## Residuals:
## Min 1Q Median 3Q Max
## -8.1257 -0.2266 0.2996 0.7167 8.0351
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.101e+00 1.093e+00 1.923 0.054667 .
## Lon -4.370e-05 4.711e-04 -0.093 0.926101
## Lat -3.391e-02 9.081e-03 -3.734 0.000196 ***
## sand_fraction -3.264e-01 6.814e-01 -0.479 0.632003
## silt_fraction -2.079e+00 1.177e+00 -1.766 0.077555 .
## bulkdensity_soil 3.047e-01 4.061e-01 0.750 0.453218
## ph_soil 3.198e-02 9.523e-02 0.336 0.737043
## soilcarbon 3.124e-02 3.582e-02 0.872 0.383254
## time_since_dist -2.938e-04 3.220e-04 -0.912 0.361725
## ScenarioSSP1-RCP2.6 4.812e-01 1.184e-01 4.064 5.09e-05 ***
## ScenarioSSP3-RCP7.0 7.720e-01 1.342e-01 5.753 1.07e-08 ***
## ScenarioSSP5-RCP8.5 7.207e-01 1.443e-01 4.995 6.60e-07 ***
## initial_recruitment_BNE 3.850e-04 1.339e-03 0.288 0.773740
## initial_recruitment_IBS -8.364e-03 2.261e-03 -3.699 0.000225 ***
## initial_recruitment_TeBS -1.302e-01 1.020e-02 -12.762 < 2e-16 ***
## initial_recruitment_Tundra 2.016e-03 2.034e-03 0.991 0.321727
## initial_recruitment_otherC -3.387e-03 2.618e-03 -1.293 0.196055
## recruitment_ten_years_BNE 7.771e-05 9.637e-05 0.806 0.420173
## recruitment_ten_years_IBS 1.375e-03 3.792e-04 3.625 0.000299 ***
## recruitment_ten_years_TeBS 1.821e-02 1.817e-03 10.018 < 2e-16 ***
## recruitment_ten_years_Tundra -2.332e-04 1.313e-04 -1.776 0.075953 .
## recruitment_ten_years_otherC 1.117e-04 7.874e-04 0.142 0.887213
## previous_state_BNE -3.093e-02 1.413e-02 -2.188 0.028800 *
## previous_state_IBS -3.948e-03 1.012e-02 -0.390 0.696581
## previous_state_TeBS -3.350e-01 6.354e-02 -5.272 1.56e-07 ***
## previous_state_Tundra -2.055e-01 8.518e-02 -2.412 0.015987 *
## previous_state_otherC -9.794e-02 2.110e-02 -4.642 3.78e-06 ***
## PC1_temp 2.264e-02 2.100e-03 10.782 < 2e-16 ***
## PC1_precip 3.524e-05 2.100e-05 1.678 0.093484 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.466 on 1414 degrees of freedom
## Multiple R-squared: 0.3486, Adjusted R-squared: 0.3357
## F-statistic: 27.03 on 28 and 1414 DF, p-value: < 2.2e-16
For a valid interpretation, the first two PCs of the target are depicted at the end of this document.
Questions here: Is there any possibility to trace back the estimates on single PFTs? For instance, in PC1 we can see, the more extreme the scenario, the larger the \(scenario\) estimate. Looking at the PCs at the bottom indicates more BNE, less IBS etc. But what happens for the single PFTs? E.g. I would suggest that the higher the temperature, the more TeBS, but how could I see that in the model? Is it even possible?
Next, the model is evaluated. Figure 6 shows residual and Q-Q plots for both PCs.
Prior to running the model, the data was split into test and train data sets. In order to evaluate, how good the model performs on unseen data, Figure 9 shows the true PC scores vs. the predicted PC scores for the test set.
The basis functions estimated by the MFPCA for the target can now be used to transform the predicted PC scores of the unseen data into functional data. As an example, Figure shows for four randomly picked disturbed grid cells, one for each scenario, the true functional fit and the predicted one from the model.
To conclude, this first attempt seems rather promising, but there is need for improvement.
Possible adaptations: